Skip to content

Conversation

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Sep 30, 2025

The code and data used for fetching this will be pushed up to a separate (private) Unicode repo once we have one. You can find the cleaned up source data in https://gist.github.com/Manishearth/d8c94a7df22a9eacefc4472a5805322e.

I'm imagining that post-1950 data will change or be removed with #7006

The initial motivation here was to fix the apparent ground truth mismatch found in https://github.com/unicode-org/icu4x/pull/7007/files#r2393049682. Turns out it was a different problem, and it has been fixed in #7013.

We may potentially need the same discussion as #6970 about whether we care about these pre-1912 dates, since that's the only time this diverges.

@gemini-code-assist
Copy link

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

@Manishearth

This comment was marked as outdated.

@Manishearth

This comment was marked as outdated.

@Manishearth Manishearth changed the title Hardcode KASI-derived data Hardcode KASI-derived data; reference year test fixes Oct 1, 2025
Comment on lines +19 to +30
PackedChineseBasedYearInfo::new(1900, [s, l, s, s, l, s, l, l, s, l, l, s, l], Some(9), gregorian(1900, 1, 31)),
PackedChineseBasedYearInfo::new(1901, [s, l, s, s, l, s, l, s, l, l, l, s, s], None, gregorian(1901, 2, 19)),
PackedChineseBasedYearInfo::new(1902, [l, s, l, s, s, l, s, l, s, l, l, l, s], None, gregorian(1902, 2, 8)),
PackedChineseBasedYearInfo::new(1903, [s, l, s, l, s, s, l, s, s, l, l, l, s], Some(6), gregorian(1903, 1, 29)),
PackedChineseBasedYearInfo::new(1904, [l, l, s, l, s, s, l, s, l, s, l, s, s], None, gregorian(1904, 2, 16)),
PackedChineseBasedYearInfo::new(1905, [l, l, l, s, l, s, s, l, s, l, s, l, s], None, gregorian(1905, 2, 4)),
PackedChineseBasedYearInfo::new(1903, [s, l, s, l, s, s, l, s, s, l, l, s, l], Some(6), gregorian(1903, 1, 29)),
PackedChineseBasedYearInfo::new(1904, [l, l, s, l, s, s, l, s, s, l, l, s, s], None, gregorian(1904, 2, 16)),
PackedChineseBasedYearInfo::new(1905, [l, l, s, l, l, s, s, l, s, l, s, l, s], None, gregorian(1905, 2, 4)),
PackedChineseBasedYearInfo::new(1906, [s, l, l, s, l, s, l, s, l, s, l, s, l], Some(5), gregorian(1906, 1, 25)),
PackedChineseBasedYearInfo::new(1907, [s, l, s, l, s, l, l, s, l, s, l, s, s], None, gregorian(1907, 2, 13)),
PackedChineseBasedYearInfo::new(1908, [l, s, l, s, l, s, l, s, l, l, s, l, s], None, gregorian(1908, 2, 2)),
PackedChineseBasedYearInfo::new(1908, [l, s, s, l, l, s, l, s, l, l, s, l, s], None, gregorian(1908, 2, 2)),
PackedChineseBasedYearInfo::new(1909, [s, l, s, s, l, s, l, s, l, l, l, s, l], Some(3), gregorian(1909, 1, 22)),
PackedChineseBasedYearInfo::new(1910, [s, l, s, s, l, s, l, s, l, l, l, s, s], None, gregorian(1910, 2, 10)),
PackedChineseBasedYearInfo::new(1911, [l, s, l, s, s, l, s, s, l, l, l, s, l], Some(7), gregorian(1911, 1, 30)),
PackedChineseBasedYearInfo::new(1911, [l, s, l, s, s, l, s, s, l, l, s, l, l], Some(7), gregorian(1911, 1, 30)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation: the 1900-1911 data matches China. one more reason to treat that differently

robertbastian added a commit that referenced this pull request Oct 1, 2025
…st (#7013)

In https://github.com/unicode-org/icu4x/pull/7007/files#r2393049682 I
noticed a Korean reference year was incorrect. At first I thought it was
due to a KASI mismatch, but it turns out our algorithm matches KASI from
1912 onwards (#7008), and you
can see the bug in our code too.

The bug was due to `generate_reference_years()` not doing Dangi: it can
be edited to do Dangi, but you have to be careful to update it in two
places. If you only change `new_china()` to `new_dangi()` you get buggy
data.

I was surprised this wasn't caught by the reference year test. Turns
out; the reference year test discards all dates that don't successfully
go through MonthDay, and currently if the produced reference year is
invalid, MonthDay construction errors.

I updated the test to instead understand what monthday combinations are
valid for each calendar.

---------

Co-authored-by: Robert Bastian <[email protected]>
@robertbastian robertbastian changed the title Hardcode KASI-derived data; reference year test fixes Hardcode KASI-derived data Oct 1, 2025
@robertbastian robertbastian merged commit 6bfd662 into unicode-org:main Oct 1, 2025
28 of 31 checks passed
@robertbastian
Copy link
Member

#6455

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants